AITopics | mixture encoder

Collaborating Authors

mixture encoder

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Mixture Encoder Supporting Continuous Speech Separation for Meeting Recognition

Vieting, Peter, Berger, Simon, von Neumann, Thilo, Boeddeker, Christoph, Schlüter, Ralf, Haeb-Umbach, Reinhold

arXiv.org Artificial IntelligenceSep-15-2023

Many real-life applications of automatic speech recognition (ASR) require processing of overlapped speech. A commonmethod involves first separating the speech into overlap-free streams and then performing ASR on the resulting signals. Recently, the inclusion of a mixture encoder in the ASR model has been proposed. This mixture encoder leverages the original overlapped speech to mitigate the effect of artifacts introduced by the speech separation. Previously, however, the method only addressed two-speaker scenarios. In this work, we extend this approach to more natural meeting contexts featuring an arbitrary number of speakers and dynamic overlaps. We evaluate the performance using different speech separators, including the powerful TF-GridNet model. Our experiments show state-of-the-art performance on the LibriCSS dataset and highlight the advantages of the mixture encoder. Furthermore, they demonstrate the strong separation of TF-GridNet which largely closes the gap between previous methods and oracle separation.

encoder, mixture encoder, separation, (17 more...)

arXiv.org Artificial Intelligence

2309.08454

Country:

Europe > Germany (0.05)
Europe > Austria > Styria > Graz (0.05)
Europe > Greece (0.05)
(7 more...)

Genre: Research Report (0.50)

Industry: Government (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Mixture Encoder for Joint Speech Separation and Recognition

Berger, Simon, Vieting, Peter, Boeddeker, Christoph, Schlüter, Ralf, Haeb-Umbach, Reinhold

arXiv.org Artificial IntelligenceJun-21-2023

Multi-speaker automatic speech recognition (ASR) is crucial for many real-world applications, but it requires dedicated modeling techniques. Existing approaches can be divided into modular and end-to-end methods. Modular approaches separate speakers and recognize each of them with a single-speaker ASR system. End-to-end models process overlapped speech directly in a single, powerful neural network. This work proposes a middle-ground approach that leverages explicit speech separation similarly to the modular approach but also incorporates mixture speech information directly into the ASR module in order to mitigate the propagation of errors made by the speech separator. We also explore a way to exchange cross-speaker context information through a layer that combines information of the individual speakers. Our system is optimized through separate and joint training stages and achieves a relative improvement of 7% in word error rate over a purely modular setup on the SMS-WSJ task.

encoder, separation, speech recognition, (13 more...)

arXiv.org Artificial Intelligence

2306.12173

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)
Europe > Germany (0.05)
Oceania > Australia > Victoria > Melbourne (0.04)
(10 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback